Knowledge Graph and Chain-of-Thought
Enhanced Data Mining for Multi-modality Neuronal Neuroscience
Abstract
Neuroscience still lacks a unified multimodal resource that
systematically integrates neuron morphology, projection, and transcriptomic
data. To fill this gap, we developed a large-scale database combining two state-of-the-art
brain atlases, 294 brain regions and 923 subregions, 182,483 neurons with
reconstructed 3-D arbors, and 1,122 genes profiled across 5.24 million cells.
On this foundation, we built NeuroXiv Knowledge Graph (NeuroXiv-KG), a
knowledge graph that encodes 34,771 nodes that correspond to brain regions,
subregions, neurons and transcriptomic cell types, and 252.6 million
cross-modality relationships, capturing the complex associations among
molecular, morphological, and neuronal connection domains. Further, we
introduced AI Powered Open Mining with Chain of Thought (AIPOM-CoT), a
schema-adaptive chain of thought agent that converts natural language prompts
into multi-step analytical workflows involving graph retrieval, statistics, and
provenance tracking. AIPOM-CoT can interpret a biologist’s question, execute
multi-stage reasoning, and return automated and reproducible results. We
demonstrate its performance through two applications: (1) an analysis of
Car3-positive neurons that identifies their subclasses, anatomical
localization, projection networks, and molecular fingerprints; and (2) a whole-brain tri-modal fingerprint map
that links molecular, morphological, and projection profiles and systematically
ranks pairwise agreement and mismatch—revealing where molecular patterns does
or does not predict morphology or connectivity. Together, NeuroXiv-KG, and
AIPOM-CoT provide a scalable AI platform for cross-modality reasoning,
accelerating new discovery of neuroscience.
Introduction
Over the past decade, major
mouse-brain datasets have expanded across modalities. Common coordinate
frameworks such as the Allen Mouse Brain CCFv3 provide a population-average
anatomical reference spanning hundreds of regions [1]; mesoscale projection resources
and serial two-photon tomography measure long-range wiring under standardized
protocols [2,3]; large collections of single-neuron reconstructions now cover
broad brain territories and reveal both stereotypy and diversity across
putative transcriptomic classes [6]; and whole-brain spatial transcriptomics
places thousands of transcriptomic types into anatomical space [4,5]. Our
earlier NeuroXiv 1.0 aggregated parts of these streams to facilitate
exploratory analyses at scale. Yet, despite this progress, these resources
largely live in separate silos—with heterogeneous schemas, coordinate systems,
and access patterns—so that unified cross-modal integration at the region
level, and especially at single-neuron granularity, remains difficult.
Spatial granularity further
complicates synthesis. The Mouse Brain Atlas of Dendritic Microenvironments
(CCF-ME) subdivides parcels using local dendritic context (>100k neurons),
improving anatomical discrimination and correlating with projection specificity
while remaining compatible with CCF space [8]. Together, CCFv3 (population
reference) and CCF-ME (morphology-informed fine parcellation) provide
complementary coordinates on which morphology, projections, and spatial
transcriptomics could, in principle, be analyzed jointly [1,8]. What has been
missing is a large, unified multimodal database that systematically brings
these modalities into one substrate and exposes explicit cross-links
(membership, projection, composition) so that region-level questions can be
asked and answered directly.
Within this landscape, existing
ecosystems each address important pieces. The Allen Brain Knowledge Portal /
Cell Type Knowledge Explorer curate high-quality cell-type resources with
excellent browsing [22]. Blue Brain Nexus offers a powerful, generic KG/data-management
backbone (RDF/ontologies) focused on FAIR modeling and versioning [23]. The
BrainGlobe suite streamlines 3D imaging workflows for detection, registration,
atlas mapping, and visualization [9,10]. Beyond mouse neuroanatomy, human
meta-analytic text-to-map tools (NeuroSynth, NeuroQuery) automate
language-to-map associations [17,18], and generic LLM-agent frameworks (ReAct,
Toolformer, AutoGPT, LangChain/LangGraph) demonstrate planning and tool use
[15,16,19–21]. Taken together, however, none of these provide a single,
region-level system that unifies morphology, projections, and spatial
transcriptomics in one graph and turns natural-language questions into audited,
reproducible analyses.
To fill this gap, we assembled a
large-scale database anchored to two state-of-the-art atlases (CCFv3 and
CCF-ME), comprising 294 brain regions and 923 subregions, 182,483 neurons with
reconstructed 3-D arbors, and 1,122 genes profiled across 5.24 million cells.
On this foundation we built the NeuroXiv Knowledge Graph (NeuroXiv-KG), which
encodes 34,771 nodes (anatomical parcels and ME subparcels, reconstructed
neurons, and transcriptomic tiers—Class/Subclass/Supertype/Cluster) and 252.6
million cross-modality relationships capturing explicit membership, projection,
and composition links [1,4,8]. We then introduce AI-Powered Open Mining with
Chain of Thought (AIPOM-CoT), a schema-adaptive agent that compiles a
biologist’s prompt into multi-step analytical workflows involving graph
traversal and statistical estimation (effect sizes, confidence intervals,
permutation-based p with FDR correction), with full provenance capture
(snapshot/seed, tool list, parameters, inputs/outputs). We demonstrate two
representative applications: a live Car3-positive analysis that identifies
subclasses, localizes anatomical pockets, maps projection networks, and
profiles molecular fingerprints of repeatedly hit targets; and a global
fingerprint survey that assigns tri-modal (molecular, morphology, projection)
fingerprints to every region to derive similarity structure and divergence
(mismatch) pairs. Together, NeuroXiv-KG and AIPOM-CoT provide the unified
multimodal resource and analysis capability needed to move from disparate datasets
to transparent, reproducible, and testable cross-modal insight.
Results
Integrating multi-modal neuroscience data into a unified knowledge
graph with AI-powered automated mining
Figure
1 | Integrating multi-modal neuroscience data into a
unified knowledge graph with AI-powered automated mining. (A) Conceptual
framework for unifying four complementary data modalities—anatomical structure,
cellular morphology, axonal connectivity, and molecular composition—into a
single queryable knowledge graph. These modalities provide different
perspectives on brain organization: anatomy defines spatial boundaries and
hierarchical relationships; morphology captures dendritic and axonal arbor
geometry; connectivity maps circuit wiring through axonal projections; and
molecular data reveals cellular identity through transcriptomic profiles.
Traditional neuroscience workflows require navigating separate databases and
manually integrating across modalities. Our system unifies these data types
within a structured knowledge representation, enabling automated cross-modal
queries and systematic comparative analyses that were previously intractable. (B)
Comprehensive data integration pipeline. Top panels show representative
examples of single-neuron morphological reconstructions paired with
connectivity matrices, curated from multiple sources including the Allen Brain
Atlas, MouseLight, and other community repositories.
Middle panels display two state-of-the-art 3D anatomical reference
atlases—CCFv3 (Allen Common Coordinate Framework version 3) and CCF-ME (merged
embryonic extension)—providing spatial registration frameworks across developmental
stages and ensuring anatomical consistency. Bottom panels illustrate
high-resolution spatial transcriptomic data from MERFISH (multiplexed
error-robust fluorescence in situ hybridization), capturing single-cell
molecular profiles with spatial coordinates. The integrated platform harmonizes
morphological reconstructions (~34,000 neurons), connectivity matrices,
anatomical parcellations (~300 brain regions), and spatial transcriptomic
datasets (>4.5 million cells) within a unified coordinate system, creating a
multi-scale representation spanning molecular, cellular, and systems levels. (C)
NeuroXiv-KG schema and scale. The knowledge graph comprises 8 node types
representing biological entities (neurons, brain regions, cell types, genes,
etc.) and 11 edge types encoding relationships (neuronal projections, regional
containment, gene expression, morphological features, etc.). The current
instantiation contains 34,771 nodes interconnected by over 258 million edges,
forming a richly connected semantic network. Colored nodes illustrate different
entity classes: anatomical regions (blue), neurons (green), molecular markers
(purple), and cell-type clusters (pink). Edge types include both explicitly
asserted relationships (e.g., "LOCATE_AT" connecting neurons to brain
regions, "PROJECT_TO" encoding axonal projections) and
computationally derived links (e.g., "HAS_CLASS" from cell-type
clustering, "BELONG_TO" for hierarchical containment). This
structured representation transforms heterogeneous neuroscience data into a
machine-readable format amenable to automated reasoning and systematic
discovery (D) AIPOM-CoT agent architecture for schema-adaptive automated
analysis. The agent employs a cognitive loop integrating large language models
(LLMs) with knowledge graph operations. Upon receiving a user query in natural
language, the **Think** module generates a chain-of-thought reasoning plan,
decomposing complex questions into executable sub-tasks while dynamically
selecting appropriate knowledge graph schemas and query patterns. The **ACT**
module translates planned steps into concrete knowledge graph retrievals and
computational operations, accessing the unified data repository. The
**Observe** module evaluates retrieved results, performing statistical analyses
and extracting insights from multi-modal data. The **Reflect** module assesses
progress toward answering the original query, identifies gaps or
inconsistencies, and generates follow-up reasoning steps, creating an iterative
refinement loop. This architecture enables the system to handle open-ended
biological questions without pre-programmed workflows, automatically
determining which data modalities to query, how to integrate them, and when the
assembled evidence sufficiently addresses the question. Critically, all
reasoning steps, data retrievals, and computational operations are logged with
full provenance, ensuring reproducibility and enabling users to validate or
refine the automated analysis
Modern
neuroscience generates data across multiple complementary scales and
modalities: spatial transcriptomics reveals molecular cell-type identity at
single-cell resolution, morphological reconstructions capture dendritic and
axonal arbor geometry, connectivity atlases map circuit wiring through
projection patterns, and anatomical reference frameworks provide hierarchical
spatial organization. Each modality offers a distinct lens on brain
organization, yet they reside in separate repositories with heterogeneous
formats, annotations, and coordinate systems. Integrating across these
modalities remains a manual, labor-intensive process requiring expert knowledge
of multiple databases, custom data parsing scripts, and ad hoc procedures for
cross-referencing entities—a workflow that scales poorly as datasets grow and
becomes a bottleneck for hypothesis generation and systematic comparative
analyses. Despite the availability of rich data resources, the lack of unified,
machine-readable integration has confined most neuroscience analyses to
single-modality studies or small-scale manual integration efforts, leaving
cross-modal patterns largely unexplored.
To
address this integration challenge, we developed NeuroXiv-KG, a comprehensive
knowledge graph unifying four fundamental data modalities—molecular
composition, cellular morphology, circuit connectivity, and anatomical
organization—within a single structured semantic network (Figure 1A-C). The
knowledge graph integrates single-neuron morphological reconstructions from
multiple sources including MouseLight and the Allen
Brain Atlas (~34,000 neurons with detailed dendritic and axonal arbors),
spatial transcriptomic datasets from MERFISH capturing molecular profiles of
>4.5 million cells across the mouse brain, axonal projection connectivity
matrices spanning hundreds of brain regions, and 3D anatomical reference
atlases (CCFv3 and CCF-ME) providing spatial registration across developmental
stages and ensuring anatomical consistency (Figure 1B).
We
harmonized these heterogeneous data types within a unified coordinate framework
and structured them using a formal schema comprising 8 node types (representing
biological entities such as neurons, brain regions, cell types, genes, and
morphological features) and 11 edge types (encoding relationships including
neuronal projections, regional containment, gene expression, and morphological
properties) (Figure 1C). The resulting knowledge graph contains 34,771 nodes
interconnected by over 258 million edges, forming a richly connected semantic
network where queries can traverse across modalities—for example, starting from
a molecular marker, identifying enriched cell types, locating brain regions,
retrieving morphological reconstructions, and mapping their projection targets.
This structured representation transforms fragmented neuroscience datasets into
a unified, machine-readable resource amenable to systematic exploration and
automated reasoning.
While
knowledge graphs provide unified data representation, extracting meaningful
insights still requires expertise in graph query languages and domain knowledge
to formulate appropriate questions. To enable automated,
natural-language-driven analysis, we developed AIPOM-CoT (AI-Powered Ontology
Mapping with Chain-of-Thought), a reasoning agent that translates open-ended
biological questions into multi-step analysis workflows (Figure 1D). The agent
employs a cognitive architecture integrating large language models with
knowledge graph operations through four interconnected modules: (1) Think —
generates chain-of-thought reasoning plans that decompose complex queries into
executable sub-tasks while dynamically selecting appropriate knowledge graph
schemas and query patterns; (2) ACT — translates planned steps into concrete
knowledge graph retrievals, statistical computations, and multi-modal data
integration operations; (3) Observe — evaluates retrieved results, extracts
quantitative patterns, and assesses the biological significance of findings;
(4) Reflect — determines whether accumulated evidence sufficiently addresses
the original question, identifies gaps or inconsistencies, and generates
follow-up reasoning steps, creating an iterative refinement loop.
This
schema-adaptive architecture enables AIPOM-CoT to handle diverse question types
without pre-programmed workflows. Given a query such as "What can you tell
me about Car3+ neurons?", the agent autonomously determines the relevant
data modalities (transcriptomic profiles to identify Car3-expressing cell
types, regional distributions to locate enrichment, morphological data to
characterize structure, projection data to map connectivity), formulates
appropriate knowledge graph queries for each modality, integrates the retrieved
information across scales, and generates interpretable summaries with full
provenance. Critically, all reasoning steps, data retrievals, and computational
operations are logged, enabling users to validate automated analyses, adjust parameters,
or extend workflows—transforming atlas interaction from manual browsing of
pre-computed views to planned, reproducible, multi-modal investigations.
Together,
NeuroXiv-KG and AIPOM-CoT constitute a unified platform for automated
cross-modal neuroscience data mining at scale. The knowledge graph provides
comprehensive multi-modal data integration, while the reasoning agent enables
natural-language-driven, automated analysis workflows that adapt to the
structure and semantics of the underlying data. To demonstrate the system's
capabilities, we present two complementary applications: First, we show how
AIPOM-CoT automatically retrieves and integrates molecular, morphological, and
connectivity data in response to an open-ended query about Car3+ neurons,
assembling a comprehensive multi-modal neuronal profile without manual data
curation or pre-specified analysis scripts (Result 3). Second, we demonstrate systematic
whole-brain analysis, where the agent automatically constructs tri-modal
fingerprints across all major brain regions and discovers pervasive cross-modal
divergence patterns—revealing that approximately 40% of region pairs display
molecular-morphological or molecular-projection mismatches, suggesting
semi-independent organizational principles operating across different
biological scales (Result 4).
These
demonstrations illustrate how the integrated platform transforms neuroscience
atlas interaction from manual, single-modality browsing to automated,
cross-modal discovery. The system enables researchers to pose biological
questions in natural language and receive integrated, multi-scale answers
spanning molecular identity, cellular morphology, circuit connectivity, and
anatomical context—analyses that would require hours to days of manual effort
using conventional approaches. By providing machine-readable integration,
automated reasoning capabilities, and full provenance tracking, the platform
establishes a foundation for systematic, reproducible cross-modal neuroscience
at scale.
Inside
AIPOM-CoT: how natural-language questions become auditable analyses
Figure
2 | AIPOM-CoT: a schema-adaptive, evidence-seeking agent
built on NeuroXiv-KG. (A) Operator-ready computation
surfaces. Two live, queryable abstractions derived from the KG: a Region→Class/Subclass/Supertype/Cluster neighborhood
exposing typed taxonomy edges, and a Region–[PROJECT_TO]–Target egonet exposing
directed projection weights and provenance. These views are the substrates the
agent traverses and aggregates—no bespoke scripts. (B) Reasoning loop with
provenance. From a natural-language prompt the agent runs a four-stage loop:
Think—parse intent, build a task graph, inspect the KG schema, and choose
operators (traversal, aggregation, ranking, enrichment, correlation/partial
correlation, similarity, permutation tests, FDR); Act—bind operators to the
schema and compile to graph queries, execute with explicit parameters and
compute budgets, and return results plus metadata (n, thresholds,
snapshot/seed, query hash); Observe—append effect sizes, confidence intervals,
permutation-based p and FDR-adjusted q to an evidence buffer, perform
coverage/stability checks, and emit intermediate insights; Reflect—apply policy
rules (add covariates, sweep thresholds, switch metrics, expand context via
“Think Deeper,” or stop) until halting criteria are met. (C) Step templates.
Three generic recipes illustrate how plans are assembled: (1) ROI selection
(rank candidate regions/taxa under coverage constraints), (2) pattern profiling
(e.g., projection or morphology summaries with uncertainty calibration), and (3)
context expansion & controls (neighbor comparisons, confound checks, metric
stability sweeps). Each template records inputs, operator choices, parameters,
outputs, and provenance, enabling replay.
Our goal is to make
cross-modality reasoning executable and reproducible. AIPOM-CoT achieves this
by coupling schema-aware planning with operator binding, evidence tracking, and
policy-driven reflection (Fig. 2).
Schema introspection and
operator binding. Given a prompt, the agent first parses intent and constructs
a task graph. It inspects the live KG schema (node/edge types and attributes
such as PROJECT_TO, HAS_CLASS/SUBCLASS/SUPERTYPE/CLUSTER, LOCATE_AT, morphology
features) and binds operators from a fixed library: graph traversals and
aggregations; ranking and enrichment; correlation and partial correlation;
similarity on fingerprints (molecular, morphology, projection); permutation
tests with FDR control; and visualization primitives. Binding produces
parameterized queries (e.g., region scopes, tier pooling, target cutoffs)
compiled to graph operations.
Execution kernel and
evidence buffer. During Act, the kernel runs the bound queries under explicit
compute budgets, returning results with sample sizes, thresholds, and a
snapshot/seed tagged by a query hash. In Observe, results are written to an
evidence buffer that stores numerical estimates (effect sizes, CIs, permutation
p, FDR q), coverage diagnostics, and the exact inputs/outputs for each step.
This buffer powers both intermediate reasoning and later replay.
Reflection policy and
halting. A policy layer analyzes the evidence buffer. If coverage is shallow,
the agent pools tiers or expands the search radius; if composition confounds
are detected, it adds covariates/partial correlations; if metrics are unstable,
it sweeps thresholds or switches similarity metrics; if context is
insufficient, it invokes a “Think Deeper” pass to broaden the plan. The loop
halts when stability, coverage, and consistency criteria are satisfied or a
budget limit is reached, at which point the agent emits a consolidated answer
and the complete execution trace.
Reproducibility and
auditability. Every run returns (i)
a natural-language answer grounded in calibrated statistics and (ii) a
machine-readable trace (operator list, parameters, inputs/outputs,
snapshot/seed, query hashes). This design enables bit-for-bit replay,
independent inspection, and straightforward comparison across runs or datasets.
Performance and
extensibility. Typical multi-step analyses execute in a very short time
end-to-end. Because operators are bound to typed relations, adding a new
modality or attribute (e.g., additional morphology metrics or a new atlas
split) requires only schema exposure; the same planning and reflection
machinery applies without custom code.
Together, these
components turn the KG into an operator-ready substrate and the
natural-language prompt into an auditable workflow—a foundation we leverage in
downstream results for concrete biological case studies and global, tri-modal
surveys.
Automated cross-modal data retrieval and integration for
comprehensive neuronal profiling
Natural language queries automatically
assemble multi-modality profiles of Car3+ claustrum neurons
Figure
3 | Natural language queries automatically assemble
multi-modality profiles of Car3+ claustrum neurons. (A) Multi-step
execution plan generated by AIPOM-CoT. Given the prompt "Can you tell me
something about Car3+ neurons?", the agent automatically decomposes the
query into a sequence of executable steps: (i) search
the knowledge graph for transcriptomic subclasses with Car3 as a marker; (ii)
rank brain regions by Car3-subclass enrichment; (iii) retrieve morphological
reconstructions from the top-ranked region; (iv) analyze projection targets;
and (v) profile molecular composition of targets. Each step in the CoT
(Chain-of-Thought) panel shows the Think → Act cycle, illustrating how natural
language is translated into concrete graph operations without manual scripting.
This automatic task decomposition is the foundation of the
retrieval-to-integration workflow. (B) Automated regional enrichment analysis.
Hypergeometric ranking across all brain regions identifies the claustrum (CLA)
as uniquely enriched for Car3-marked subclasses, accounting for ~43% of
occurrences—far exceeding other regions (e.g., ACAd ~19%, MOs ~14%).
Importantly, CLA was not pre-selected; the system discovered this enrichment
pattern automatically through statistical ranking. Bars show the percentage of
Car3+ subclass cells attributed to each region. (C) Automated spatial
integration of multi-modal data. Left: Spatial distribution of transcriptomic
cells expressing Car3-related markers, co-registered to atlas coordinates.
Right: Representative single-neuron morphological reconstruction from CLA
showing soma location (blue) and long-range axonal arbor (red). The system
automatically retrieves and aligns these distinct data modalities in a common
coordinate frame, enabling direct comparison between molecular and structural
features. This panel demonstrates automated spatial integration without manual
data curation. (D) Automated aggregation of projection targets. Heat map shows
a neuron-by-target matrix compiled from all available CLA reconstructions,
revealing repeatedly hit downstream regions including ENTl, layer-specific
cortical sites (MOs/ACAd L2/3, L5), and insular cortex (AI). Color intensity
indicates relative projection strength. The system automatically aggregates
axonal endpoints across neurons to identify consistent projection
patterns—target discovery is data-driven rather than hypothesis-driven. (E) Automated
molecular fingerprinting of target regions. Stacked bars show the
transcriptomic composition (Class/Subclass/Supertype/Cluster tiers) of the
repeatedly hit targets identified in (D). Cortical targets (MOs/MOp, ACAd, AI)
are dominated by IT-excitatory types (L2/3, L4/5 classes, green/blue), while
entorhinal/retrosplenial sites (ENTl, ENTm, RSP) show mixed IT signatures. The
table groups targets into two functional systems: frontal/motor output and
control, and entorhinal–retrosplenial contextual integration, with
representative molecular markers listed. This automated molecular profiling
reveals functional module organization as a byproduct of the technical
demonstration, suggesting that CLA serves as an associative hub linking motor
control with memory-related networks. (F) Provenance subgraph for replay and
validation. Subgraph extracted from NeuroXiv-KG showing all nodes (regions,
Car3-marked subclasses, targets) and edges (enrichment, projection, molecular
composition) accessed during the workflow. The complete execution
trace—including query strings, thresholds, sample sizes, snapshot/seed, and
operator sequences—enables bit-for-bit replay and independent inspection. This
provenance tracking ensures that the automated workflow is auditable and
reproducible.
To
demonstrate the system's capacity for automated cross-modal analysis, we posed
an open-ended biological question to AIPOM-CoT: "Can you tell me something
about Car3+ neurons?" (Figure 3A). Without manual data curation or
pre-selected datasets, the agent autonomously decomposed this query into a
multi-step analysis plan: (1) identify transcriptomic subclasses enriched for
Car3 expression, (2) determine which brain regions show the highest enrichment
for these subclasses, (3) retrieve morphological reconstructions from the
enriched region, (4) map projection targets of reconstructed neurons, and (5)
characterize the molecular composition of target regions. This automatic task
decomposition—translating a simple natural language question into a structured,
executable workflow—represents the first demonstration of on-demand cross-modal
data integration in neuroscience, where the system determines what information
to retrieve and how to combine it, rather than following pre-specified analysis
scripts.
AIPOM-CoT
first queried the knowledge graph to identify transcriptomic subclasses
expressing Car3, discovering the "003 CLA-EPd-CTX
Car3 Glut" subclass as the primary Car3-expressing population. The agent
then automatically ranked brain regions by their enrichment for this subclass,
revealing that the claustrum (CLA) contains approximately 43% of cells from
this subclass—substantially higher than any other region examined (Figure 3B).
Notably, the system identified this CLA enrichment through automated retrieval
and quantification across the entire knowledge graph, not through
hypothesis-driven selection or manual literature review. This demonstrates the
system's ability to discover region-specific cellular enrichments on demand,
assembling quantitative profiles from integrated molecular datasets without
requiring users to know in advance where interesting patterns might emerge.
Having
identified CLA as the region of maximal Car3+ subclass enrichment, AIPOM-CoT
next retrieved morphological reconstruction data for CLA neurons from the
knowledge graph (Figure 3C, left). The system then automatically extracted
projection information from these reconstructions, identifying the target
subregions innervated by CLA neurons (Figure 3C, right; Figure 3D). The
projection analysis revealed a complex, heterogeneous pattern: CLA neurons
target multiple cortical areas with varying strengths, including frontal,
motor, and sensory regions, as well as select subcortical structures. This
automated retrieval-to-integration workflow—moving seamlessly from
transcriptomic identity to regional localization to morphological data to
projection mapping—occurs entirely through natural language interaction,
without requiring the user to manually navigate different data modalities,
specify join operations, or write custom analysis code.
To
complete the multi-modality profile, AIPOM-CoT automatically generated
molecular fingerprints for each CLA projection target by querying cell type
composition across target regions (Figure 3E). This analysis revealed the
molecular diversity of the areas receiving CLA input, with different targets
displaying distinct cellular compositions—some dominated by specific cortical
layer markers, others by particular GABAergic
subtypes. The system organized this information into an interpretable summary,
including a knowledge graph visualization showing the relationships among Car3+
neurons, their projection targets, and the molecular characteristics of those
targets (Figure 3F). Critically, all steps—from initial query to final
integration—were executed automatically with full provenance tracking,
recording the specific knowledge graph queries, data sources, and computational
parameters used at each stage. This enables independent validation and
iterative refinement of the analysis.
While
the primary contribution of this demonstration is the automated
retrieval-to-integration capability itself, the analysis as a byproduct
generated an integrated multi-modal profile of Car3+ CLA neurons. The
substantial CLA enrichment of Car3+ cells, combined with their widespread
cortical and selective subcortical projections, is suggestive of a role in
broad cortical coordination and cross-modal integration—functions previously
hypothesized for the claustrum. The molecular heterogeneity of CLA target regions
further suggests that Car3+ neurons may differentially modulate distinct
cortical processing streams. However, testing these hypotheses would require
targeted experimental validation beyond the scope of automated knowledge graph
analysis.
The
key advance demonstrated here is not the specific biological findings about
Car3+ neurons, but rather the system's ability to assemble comprehensive
multi-modal neuronal profiles on demand, in response to arbitrary natural
language queries, without manual data curation. The entire analysis—from
question to integrated summary—was completed in approximately 60 seconds,
compared to hours or days that would be required for manual cross-modal data
integration using conventional approaches. This represents the first system
capable of translating open-ended biological questions into automated,
multi-step cross-modal retrieval workflows that span molecular, morphological,
and connectivity data, delivering integrated results with full provenance for
reproducibility and validation.
Automated whole-brain
tri-modal fingerprinting systematically reveals regions with cross-modal
concordance versus divergence
Figure
4 | Automated whole-brain tri-modal fingerprinting
systematically reveals regions with cross-modal concordance versus divergence. (A) Whole-brain
pairwise similarity matrices for morphology (left), molecular (center), and
projection (right) fingerprints across 30 brain regions. Morphology and
projection fingerprints display prominent block structures (red-orange blocks),
indicating functional modularity where certain region groups share similar
cellular morphologies or projection patterns. In striking contrast, the
molecular fingerprint matrix is dominated by low inter-region similarity
(blue), except along the diagonal, revealing that molecular composition is
highly region-specific. (B) Cross-modal mismatch indices quantify the
divergence between modalities. Left: Molecule-Morphology Divergence matrix
shows region pairs where molecular composition diverges from morphological
organization. Right: Molecule-Projection Divergence matrix reveals pairs where
molecular similarity fails to predict projection patterns. Approximately 40% of region pairs display systematic cross-modal
divergence (warm colors), demonstrating that concordance across modalities is
not the norm. (C) Exemplar case of Molecule-Morphology Divergence: LHA versus
TU. Left: Morphology feature profiles (radar plot) show nearly identical
dendritic arbor characteristics between lateral hypothalamic area (LHA) and
tuberal nucleus (TU). Right: Despite morphological similarity, their top
neuronal subtypes are completely distinct. LHA contains diverse
neuropeptidergic and neurotransmitter-defined populations (Skor1+
glutamatergic, Foxb1+ glutamatergic, Pitx2+ glutamatergic neurons), reflecting
its role as a multifunctional hypothalamic hub regulating arousal, feeding, and
motivated behavior. TU, by contrast, comprises a more specialized GABAergic
population involved in hedonic feeding control. (D) Exemplar case of Projection
Convergence despite Molecular Divergence: MOs versus LHA. Left: Projection
target profiles reveal substantial overlap in target structures, with both secondary
motor cortex (MOs) and lateral hypothalamic area (LHA) projecting to striatum
(STR, CP, ACB), thalamus (MD, VM, ZI), and brainstem (PAG, MRN, MB). While the
relative strengths differ—MOs projects more strongly to cortical and striatal
motor areas while LHA emphasizes brainstem arousal centers—the shared target
set reflects convergent control architecture where multiple brain systems
coordinate behavior through common downstream effectors. Right: Despite this
projection convergence, their molecular compositions are completely distinct.
MOs comprises laminar-organized cortical glutamatergic subtypes (IT-type,
ET-type, CT-type neurons marked by layer-specific genes) along with GABAergic
interneurons. LHA contains hypothalamic glutamatergic neurons marked by
region-specific transcription factors (Skor1+, Foxb1+, Pitx2+) and
neuropeptidergic signatures.
Having established automated cross-modal integration for targeted
queries, we next asked whether our system could perform systematic, whole-brain
analyses of cross-modal relationships. We instructed AIPOM-CoT to construct
tri-modal fingerprints (molecular, morphological, and projection) for all 32
major brain regions in our knowledge graph and compute pairwise similarity
matrices for each modality (Figure 4A).
The resulting similarity patterns revealed a striking contrast.
Morphology and projection fingerprints displayed prominent block structures,
with certain region groups showing high mutual similarity (e.g., cortical areas
in morphology, hypothalamic subregions in projection patterns). This modular
organization suggests that functional relatedness at morphological or
connectivity levels often transcends strict anatomical boundaries. In stark
contrast, the molecular fingerprint matrix was dominated by sparse, low
inter-region similarity, with high values confined almost exclusively to the
diagonal. This fundamental asymmetry—modular organization at morphological and
projection levels versus regional specificity at the molecular level—indicates
that molecular composition is largely region-specific, whereas morphological
and projection properties can converge across anatomically distant regions.
To quantify how often regions display concordance versus divergence
across modalities, we computed cross-modal mismatch indices for all region
pairs (Figure 4B). The Molecule-Morphology Divergence matrix identified pairs
where molecular composition diverges from morphological organization, while the
Molecule-Projection Divergence matrix highlighted cases where molecular
profiles do not align with projection patterns. Approximately 40% of region pairs exhibited systematic cross-modal divergence,
demonstrating that multi-modal concordance—where regions similar in one
modality are also similar in another—is not the norm. Instead, brain
organization operates along semi-independent axes, with morphological and
projection similarities often arising independently of molecular composition.
To illustrate the biological significance of these divergence patterns,
we examined two representative cases automatically identified by AIPOM-CoT.
First, we analyzed the lateral hypothalamic area (LHA) and tuberal nucleus
(TU), which display high morphological similarity but marked molecular
divergence (Figure 4C). Their morphological feature profiles—including axon
length, branching complexity, and dendritic arbor characteristics—are nearly
identical, suggesting similar local circuit integration strategies. However,
their neuronal compositions differ dramatically. LHA contains a highly
heterogeneous population including Skor1+, Foxb1+, and Pitx2+ glutamatergic
neurons, along with specialized neuropeptidergic subtypes (e.g., orexin,
melanin-concentrating hormone) that support its multifunctional role as a
hypothalamic hub regulating arousal, feeding, reward, and motivated behavior.
TU, by contrast, is dominated by a more homogeneous GABAergic population
specialized for hedonic feeding regulation, receiving specific neurotensin
inputs from the lateral septum.
This example demonstrates functional convergence at the morphological
level despite molecular divergence. The morphological similarity likely
reflects shared computational demands—both regions process feeding-related
signals and may adopt similar dendritic architectures for local integration.
Yet their molecular identities diverge, reflecting distinct developmental
origins (LHA neurons arise from multiple progenitor domains; TU neurons derive
from ventromedial hypothalamic progenitors) and functional specializations (LHA
as a multifunctional integration hub versus TU as a specialized circuit node).
Thus, morphological organization can converge across regions to support related
functions while molecular composition remains tied to developmental history and
cell-type-specific roles.
The second case examined secondary motor cortex (MOs) versus LHA, which
display projection convergence despite complete molecular divergence (Figure
4D). Despite entirely different molecular compositions—MOs comprises
laminar-organized cortical glutamatergic neurons (IT-type, ET-type, CT-type)
while LHA contains hypothalamic glutamatergic neurons marked by Skor1, Foxb1,
and Pitx2—both regions project to a largely overlapping set of targets
including striatum (STR, CP, ACB), thalamus (MD, VM, ZI), and brainstem
structures (PAG, MRN, MB). While the relative projection strengths differ, with
MOs emphasizing cortical-striatal motor loops and LHA emphasizing
hypothalamic-brainstem arousal pathways, the substantial target overlap
reflects a convergent control architecture where multiple brain systems
coordinate behavior through shared downstream effectors.
This convergence reveals a fundamental organizational principle:
molecularly distinct neuronal populations from evolutionarily and
developmentally disparate systems can converge onto common targets to enable
multi-system behavioral coordination. MOs, arising from Emx1+ cortical
progenitors and organized into laminar glutamatergic subtypes, provides learned
motor programs and flexible sensorimotor control—answering 'how to move'. LHA,
arising from hypothalamic progenitor domains and expressing region-specific
transcription factors and neuropeptides, provides innate motivational drives
and arousal states—answering 'why to move'. Both systems project to striatum,
but with different functional contributions: MOs specifies motor sequences and
habits, while LHA gates movement with motivational vigor and urgency.
Similarly, both project to PAG—MOs for the motor execution of defensive
behaviors, LHA for the arousal and affective components of defense.
This convergent architecture allows the brain to integrate
phylogenetically ancient mechanisms (hypothalamic drives) with recently evolved
capacities (cortical motor control) through shared intermediate structures.
Projection patterns thus reflect functional integration demands and circuit
roles rather than being determined by neurotransmitter class or molecular
identity. The molecular divergence between MOs and LHA neuronal
populations—reflecting distinct developmental programs, evolutionary histories,
and specialized signaling repertoires—does not prevent them from accessing
similar downstream targets, demonstrating that connectivity is organized by
functional requirements that transcend molecular boundaries.